在现实世界中经营通常需要代理商来了解复杂的环境,并应用这种理解以实现一系列目标。这个问题被称为目标有条件的强化学习(GCRL),对长地平线的目标变得特别具有挑战性。目前的方法通过使用基于图形的规划算法增强目标条件的策略来解决这个问题。然而,他们努力缩放到大型高维状态空间,并采用用于有效地收集训练数据的探索机制。在这项工作中,我们介绍了继任者功能标志性(SFL),这是一种探索大型高维环境的框架,以获得熟练的政策熟练的策略。 SFL利用继承特性(SF)来捕获转换动态的能力,通过估计状态新颖性来驱动探索,并通过将状态空间作为基于非参数标志的图形来实现高级规划。我们进一步利用SF直接计算地标遍历的目标条件调节策略,我们用于在探索状态空间边缘执行计划“前沿”地标。我们在我们的Minigrid和VizDoom进行了实验,即SFL可以高效地探索大型高维状态空间和优于长地平线GCRL任务的最先进的基线。
translated by 谷歌翻译
From smoothly pursuing moving objects to rapidly shifting gazes during visual search, humans employ a wide variety of eye movement strategies in different contexts. While eye movements provide a rich window into mental processes, building generative models of eye movements is notoriously difficult, and to date the computational objectives guiding eye movements remain largely a mystery. In this work, we tackled these problems in the context of a canonical spatial planning task, maze-solving. We collected eye movement data from human subjects and built deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We found that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze. This not only provides a generative model of eye movements in this task but also suggests a computational theory for how humans solve the task, namely that humans use mental simulation.
translated by 谷歌翻译
Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algorithms suffer from excessive weight-training costs, performance degradation, limited generalizability, or complicated training. To address these problems, we propose to find a strong lottery ticket via moment-matching scores. Our experimental results show that the discovered subnetwork can perform similarly or better than the trained dense model even when only 10% of the weights remain. To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably. Our code and supplementary materials are publicly available.
translated by 谷歌翻译
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video into spatial-temporal visual tokens and propose an embedding method for masked video token modeling to facilitate multi-task learning. We conduct extensive experiments to demonstrate the quality, efficiency, and flexibility of MAGVIT. Our experiments show that (i) MAGVIT performs favorably against state-of-the-art approaches and establishes the best-published FVD on three video generation benchmarks, including the challenging Kinetics-600. (ii) MAGVIT outperforms existing methods in inference time by two orders of magnitude against diffusion models and by 60x against autoregressive models. (iii) A single MAGVIT model supports ten diverse generation tasks and generalizes across videos from different visual domains. The source code and trained models will be released to the public at https://magvit.cs.cmu.edu.
translated by 谷歌翻译
FSS(Few-shot segmentation)~aims to segment a target class with a small number of labeled images (support Set). To extract information relevant to target class, a dominant approach in best performing FSS baselines removes background features using support mask. We observe that this support mask presents an information bottleneck in several challenging FSS cases e.g., for small targets and/or inaccurate target boundaries. To this end, we present a novel method (MSI), which maximizes the support-set information by exploiting two complementary source of features in generating super correlation maps. We validate the effectiveness of our approach by instantiating it into three recent and strong FSS baselines. Experimental results on several publicly available FSS benchmarks show that our proposed method consistently improves the performance by visible margins and allows faster convergence. Our codes and models will be publicly released.
translated by 谷歌翻译
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only 'easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.
translated by 谷歌翻译
Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging. In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer. Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods, achieving a new state-of-the-art On-TAL performance. In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting. The code is available at https://github.com/TuanTNG/SimOn
translated by 谷歌翻译
来自磁共振成像(MRI)的体积图像在直肠癌的术前分期提供了宝贵的信息。最重要的是,T2和T3阶段之间的准确术前歧视可以说是直肠癌治疗的最具挑战性和临床意义的任务,因为通常建议对T3(或更大)阶段癌症患者进行化学疗法。在这项研究中,我们提出了一个体积卷积神经网络,可准确区分T2与直肠MR体积的T3阶段直肠癌。具体而言,我们提出1)基于自定义的基于重新连接的卷编码器,该编码器与晚期融合的固定间关系建模(即最后一层的3D卷积),2)双线性计算,该计算汇总了编码器所得的功能以创建一个创建一个的功能体积特征和3)三重损失和焦点损失的关节最小化。通过病理确认的T2/T3直肠癌的MR量,我们进行了广泛的实验,以比较残留学习框架内的各种设计。结果,我们的网络达到了0.831的AUC,高于专业放射科医生组的准确性。我们认为该方法可以扩展到其他卷分析任务
translated by 谷歌翻译
最近,对时间变化的知识图或时间知识图(TKG)的学术兴趣越来越高。先前的研究表明,使用历史信息的TKG推理的多种方法。但是,在不同时间戳上此类信息中对层次结构的关注较少。鉴于TKG是基于时间的一系列知识图,因此序列中的年代学衍生了图之间的层次结构。此外,每个知识图都有其层次结构级别,可能相互不同。为了解决TKG中的这些层次结构特征,我们提出了HyperVC,它利用比欧几里得空间更好地编码层次结构的双曲线空间。不同时间戳上知识图之间的时间顺序结构是通过将知识图作为矢量嵌入通用双曲线空间中的矢量来表示的。此外,通过调整其实体和关系的双曲线嵌入的曲率来表示,知识图的各种层次级别。四个基准数据集的实验显示出很大的改进,尤其是在层次级别较高的数据集上。
translated by 谷歌翻译
在线立体声适应解决了由合成(训练)和真实(测试)数据集之间的不同环境引起的域移位问题,以迅速适应动态现实世界应用程序(例如自动驾驶)中的立体声模型。但是,以前的方法通常无法抵消与动态物体有关的特定区域,并具有更严重的环境变化。为了减轻此问题,我们建议将辅助点选择性网络纳入称为PointFix的元学习框架中,以提供对在线立体声适应的立体声模型的强大初始化。简而言之,我们的辅助网络学会通过通过元级别有效地反向传播局部信息来固定局部变体,从而实现基线模型的稳健初始化。该网络是模型 - 不合时宜的,因此可以以任何插件的方式以任何形式的架构使用。我们进行了广泛的实验,以在三个适应设置(例如短期,中和长期序列)下验证我们的方法的有效性。实验结果表明,辅助网络对基本立体声模型的适当初始化使我们的学习范式在推理时达到了最新的性能。
translated by 谷歌翻译